Text and Image Metasearch on the Web

نویسندگان

  • Steve Lawrence
  • C. Lee Giles
چکیده

As the Web continues to increase in size, the relative coverage of Web search engines is decreasing, and search tools that combine the results of multiple search engines are becoming more valuable. This paper provides details of the text and image metasearch functions of the Inquirus search engine developed at the NEC Research Institute. For text metasearch, we describe features including the use of link information in metasearch, and provide statistics on the usage and performance of Inquirus and the Web search engines. For image metasearch, Inquirus queries multiple image search engines on the Web, downloads the actual images, and creates image thumbnails for display to the user. Inquirus handles image search engines that return direct links to images, and engines that return links to HTML pages. For the engines that return HTML pages, Inquirus analyzes the text on the pages in order to predict which images are most likely to correspond to the query. The individual image search engines tend to excel at different classes of queries, and the combination of engines is surprisingly effective at finding images corresponding to a given query. Both the text and image metasearch functions of Inquirus are surprisingly fast, and we describe the parallel architecture of the engine that provides this efficiency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A World Wide Web Based Image Search Engine Using Text and Image Content Features

Using both text and image content features, a hybrid image retrieval system for Word Wide Web is developed in this paper. We first use a text-based image metasearch engine to retrieve images from the World Wide Web based on the text information on the image host pages to provide an initial image set. Because of the high-speed and low cost nature of the text-based approach, we can easily retriev...

متن کامل

Detection of Heterogeneities in a Multiple Text Database Environment

As the number of text retrieval systems (search engines) grows rapidly on the World Wide Web, there is an increasing need to build search brokers (metasearch engines) on top of them. Often, the task of building an eeective and eecient metasearch engine is hindered by the heterogeneities among the underlying local search engines. In this paper, we rst analyze the impact of various heterogeneitie...

متن کامل

Captain Nemo: A Metasearch Engine with Personalized Hierarchical Search Space

Personalization of search has gained a lot of publicity the last years. Personalization features in search and metasearch engines are a follow-up to the research done. On the other hand, text categorization methods have been successfully applied to document collections. Specifically, text categorization methods can support the task of classifying Web content in thematic hierarchies. Combining t...

متن کامل

Text Documents

The World Wide Web has become the largest information source in recent years, and search engines are indispensable tools for finding needed information from the Web. While modern search engine technology has its roots in text/information retrieval techniques, it also consists of solutions to unique problems arising from the Web such as web page crawling and utilizing linkage information to impr...

متن کامل

Analyse de la robustesse des algorithmes de méta-recherche discriminante

This paper studies the sensitivity of four metasearch engines under different situations. The focus of this analysis is on trainable metasearch engines. Our main contribution is a large scale systematic analysis of the performance and behavior of these methods on several corpora. Firstly, we analyze how the choice and normalization of the relevance score delivered by base search engines influen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999